Evolutionary Parsing for a Probabilistic Context Free Grammar

نویسنده

  • Lourdes Araujo
چکیده

Classic parsing methods are based on complete search techniques to find the different interpretations of a sentence. However, the size of the search space increases exponentially with the length of the sentence or text to be parsed, so that exhaustive search methods can fail to reach a solution in a reasonable time. Nevertheless, large problems can be solved approximately by some kind of stochastic techniques, which do not guarantee the optimum value, but allow adjusting the probability of error by increasing the number of points explored. Genetic Algorithms are among such techniques. This paper describes a probabilistic natural language parser based on a genetic algorithm. The algorithm works with a population of possible parsings for a given sentence and grammar, which represent the chromosomes. The algorithm produces successive generations of individuals, computing their “fitness” at each step and selecting the best of them when the termination condition is reached. The paper deals with the main issues arising in the algorithm: chromosome representation and evaluation, selection and replacement strategies, and design of genetic operators for crossover and mutation. The model has been implemented, and the results obtained for a number of sentences are presented. keywords: Evolutionary programming, Parsing, Probabilistic Grammar

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

Parsing Morphologically Complex Words

We present a method for probabilistic parsing of German words. Our approach uses a morphological analyzer based on weighted finitestate transducers to segment words into lexical units and a probabilistic context free grammar trained on a manually created set of word trees for the parsing step.

متن کامل

Probabilistic Context-Free Grammar Induction Based on Structural Zeros

We present a method for induction of concise and accurate probabilistic contextfree grammars for efficient use in early stages of a multi-stage parsing technique. The method is based on the use of statistical tests to determine if a non-terminal combination is unobserved due to sparse data or hard syntactic constraints. Experimental results show that, using this method, high accuracies can be a...

متن کامل

A Probabilistic Context-free Grammar for Disambiguation in Morphological Parsing

One of the major problems one is faced with when decomposing words into their constituent parts is ambiguity: the generation of multiple analyses for one input word, many of which are implausible. In order to deal with ambiguity, the MORphological PArser MORPA is provided with a probabilistic context-free grammar (PCFG), i.e. it combines a "conventional" context-free morphological grammar to fi...

متن کامل

Robust German Noun Chunking With a Probabilistic Context-Free Grammar

We present a noun chunker for German which is based on a head-lexicalised probabilistic contextfree grammar. A manually developed grammar was semi-automatically extended with robustness rules in order to allow parsing of unrestricted text. The model parameters were learned from unlabelled training data by a probabilistic context-free parser. For extracting noun chunks, the parser generates all ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000